This notebook is trying to reproduce graphs on the page: http://berkeleyearth.org/results-summary/
We have downloaded the annual-with-forcing-small.png
and decadal-with-forcing-small.png
images and we are trying to reproduce them
below using the original data. The page says that these graphs were constructed using the Full_TAVG_complete.txt
file. As we show below, this file does not seem to reproduce the graphs. However, the page also references an Excel file http://berkeleyearth.org/xls/forcing-comparison.xlsx
, so we have saved the numbers for temperature, volcanic and CO2 concentrations from it into best.data
. This file does seem to reproduce all the plots, see below. Although some confusion remains over what exact averaging was done in the original plots.
In [1]:
%pylab inline
from IPython.display import Image
def running_average(temp, n=5, m=6, unc=False):
"""
Calculates centered running average from -n to +m points (total of n+m+1 points).
Example 1: if n=m=6 and the data is monthly, then the result is a centered 13 month running average.
Example 2: if n=5, m=6 and the data is monthly, then the result is a "centered" 12 month moving average
(another way to center it is with n=6, m=5).
unc ... if True, calculate the average of uncertainties
"""
avg = temp.copy()
for i in range(n, size(temp)-m):
window = temp[i-n:i+m+1]
if unc:
avg[i] = sqrt(sum(window**2))/len(window)
else:
avg[i] = average(window)
avg[:n] = NaN
avg[-m:] = NaN
return avg
def month_to_year(temp, unc=False):
return running_average(temp, 5, 6, unc) # 5+6+1 = 12 months
def month_to_10years(temp, unc=False):
return running_average(temp, 59, 60, unc) #59+60+1 = 120 months
def month_to_20years(temp, unc=False):
return running_average(temp, 119, 120, unc) #119+120+1 = 240 months
In [2]:
# This file contains monthly temperature data
D = loadtxt("data/Full_TAVG_complete.txt", comments="%")
# Read from file:
T0 = 8.79
# Construct floating point years using years + months
years_int = D[:, 0]
months = D[:, 1]
years2 = years_int + (months-1) / 12
temp_month = D[:, 2] + T0
unc_month = D[:, 3]
temp1 = D[:, 4] + T0
unc1 = D[:, 5]
temp10 = D[:, 8] + T0
unc10 = D[:, 9]
temp20 = D[:, 10] + T0
unc20 = D[:, 11]
The raw data in Full_TAVG_complete.txt
are the temp_month
and unc_month
arrays, which give monthly temperature data and uncertainties. The rest of the data in the file are various running averages, that can be calculated using the month_to_year
, month_to_10years
and month_to_20years
functions, as verified by the following error plots (all errors must be less than $10^{-3}$ as that is the accuracy given in the text file). I haven't figured out how to reproduce the averaged uncertainty yet.
In [3]:
semilogy(years2, temp1-month_to_year(temp_month))
title("month_to_year")
show()
semilogy(years2, temp10-month_to_10years(temp_month))
title("month_to_10years")
show()
semilogy(years2, temp20-month_to_20years(temp_month))
title("month_to_20years")
show()
Conclusion: the file Full_TAVG_complete.txt
contains monthly data and we know how to calculate the various moving averages from it.
The best.data
file contains 12 month average of temperatures, CO2 and volcanic data.
In [4]:
# This file contains 12 month moving average of temperatures, co2 and volcanic
D = loadtxt("data/best.data")
years = D[:, 0]
log_CO2 = D[:, 1]
volcanic = D[:, 2]
temp = D[:, 3]
unc = D[:, 4]
In [5]:
D = loadtxt("data/berkeley_co2.txt")
years_ = D[:-2, 0]
CO2_ = D[:-2, 1]
D = loadtxt("data/berkeley_volcanic.txt")
years_2 = D[:, 0]
volcanic_ = D[:, 1]
In [6]:
plot(years_, CO2_, label="orig")
plot(years_, month_to_year(CO2_), "r-", label="12-month moving average")
xlim([1940, 2013])
ylim([300, 400])
legend(loc="upper left");
Let's verify this for CO2. We plot the CO2 concentrations
from the best.data
file an compare against monthly, annual and 10 year averages of CO2 data from Mauna Loa (we shifted the curves, so that one can compare the shapes).
From the graph it is clear that the 12 month average exactly agrees with the data from best.data
.
In [7]:
D = loadtxt("data/co2_mm_mlo.txt")
years3 = D[:, 2]
trend3 = D[:, 5]
CO2 = exp(log_CO2) * 277.3 # We convert from log CO2
plot(years, CO2, label="best.data")
plot(years3, trend3-2, label="monthly")
plot(years3, month_to_year(trend3)+2, label="12 month average")
plot(years3, month_to_10years(trend3)+4, label="10 year average")
xlim([1950, 2015])
ylim([300, 400])
title("CO2 graphs")
xlabel("Year")
ylabel("ppm")
legend(loc="upper left");
Let's also verify this for the temperatures against the monthly data from Full_TAVG_complete.txt
. It is clear that the best agreement is with the 12-month averaged data.
In [8]:
plot(years2, temp_month, label="monthly")
plot(years, temp, label="best.data")
plot(years2, month_to_year(temp_month), label="12 month average")
plot(years2, month_to_10years(temp_month), "y-", label="10 year average")
title("Temperature graphs")
xlabel("Year")
ylabel("Temperature")
legend(loc="lower right");
Unfortunately, the agreement is not perfect:
In [9]:
plot(years, temp, label="best.data")
plot(years2, month_to_year(temp_month), label="12 month average")
title("Temperature graphs")
xlabel("Year")
ylabel("Temperature")
legend(loc="lower right");
We can shift the 12-month average by 0.15 up, then it agrees quite well for years > 1950. But there is significant disagreement for earlier years.
In [10]:
plot(years, temp, label="best.data")
plot(years2, month_to_year(temp_month)+0.15, label="12 month average shifted by 0.15")
title("Temperature graphs")
xlabel("Year")
ylabel("Temperature")
legend(loc="lower right");
Conclusion: the file best.data
contains yearly (12-month) averages of temperature, CO2 and volcanic data. We can reprouce the CO2 data from Mauna Loa exactly,
an we can reproduce the yearly temperature data almost exactly for years > 1950, but we see disagreements for earlier years.
Let's calculate the fit using the $\alpha$, $\beta$ and $\gamma$ values provides in the Excel file:
In [11]:
alpha = 8.3421049
beta = 4.4663687
gamma = -0.0151461
fit = alpha + beta * log(CO2_ / 277.3) + gamma * volcanic_
We plot the annual data from best.data
(i.e. we plot it directly, no averaging is done). This seems to agree well with the original graph:
In [12]:
figure(figsize=(8, 6))
fill_between(years, temp+unc, temp-unc, color="gray")
plot(years, temp, "k-")
plot(years_, month_to_year(fit), "r-", lw=2)
xlabel("Year")
ylabel("Annual temperature [$^\circ C$]")
title("Berkeley Averaging method")
xlim([1750, 2013])
ylim([6.5, 10.5])
grid()
show()
Image(filename="data/annual-with-forcing-small.png")
Out[12]:
For comparison, we plot the same graph, but using temperatures from the file Full_TAVG_complete.txt
(averaged over 12 months). This does not agree well (as is to be expected per our analysis of the files above):
In [13]:
figure(figsize=(8, 6))
fill_between(years2, temp1+unc1, temp1-unc1, color="gray")
plot(years2, temp1, "k-")
plot(years_, month_to_year(fit), "r-", lw=2)
xlabel("Year")
ylabel("Annual temperature [$^\circ C$]")
title("Berkeley Averaging method")
xlim([1750, 2013])
ylim([6.5, 10.5])
grid()
show()
Image(filename="data/annual-with-forcing-small.png")
Out[13]:
Now we try to reproduce the other graph. We plot a 10 year average. This seems to agree well:
In [14]:
figure(figsize=(8, 6))
# the uncertainties must be averaged differently, but to a first approximation this is ok:
fill_between(years, month_to_10years(temp+unc), month_to_10years(temp-unc), color="gray")
# TODO: we need to get "temp" on monthly data
plot(years, month_to_10years(temp), "k-")
plot(years_, month_to_10years(fit), "r-", lw=2)
xlabel("Year")
ylabel("10 year moving average [$^\circ C$]")
title("Berkeley Averaging method")
xlim([1750, 2013])
ylim([7.5, 10])
grid()
show()
Image(filename="data/decadal-with-forcing-small.png")
Out[14]:
Let's also the data from Full_TAVG_complete.txt
and calculate 10-year average. The temperature data is shifted:
In [15]:
figure(figsize=(8, 6))
fill_between(years2, temp10+unc10, temp10-unc10, color="gray")
plot(years2, month_to_10years(temp_month), "k-", label="Temperature (10 year average)")
plot(years_, month_to_10years(fit), "r-", lw=2, label="fit (10 year moving average)")
xlabel("Year")
ylabel("Decadal temperature [$^\circ C$]")
title("Berkeley Averaging method")
xlim([1750, 2013])
ylim([7.5, 10])
grid()
legend(loc="lower right")
show()
Image(filename="data/decadal-with-forcing-small.png")
Out[15]:
The only discrepancy is the temperature data, otherwise we seem to agree pretty well.